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LAMBERT W RANDOM VARIABLES A NEW FAMILY OF 
GENERALIZED SKEWED DISTRIBUTIONS WITH 
APPLICATIONS TO RISK ESTIMATION 

By Georg M. Goerg 

Carnegie Mellon University 

Originating from a system theory and an input/output point of 
view, I introduce a new class of generalized distributions. A para- 
metric nonlinear transformation converts a random variable X into 
a so-called Lambert W random variable Y, which allows a very flex- 
ible approach to model skewed data. Its shape depends on the shape 
of X and a skewness parameter 7. In particular, for symmetric X 
and nonzero 7 the output Y is skewed. Its distribution and density 
function are particular variants of their input counterparts. Maxi- 
mum likelihood and method of moments estimators are presented, 
and simulations show that in the symmetric case additional estima- 
tion of 7 does not affect the quality of other parameter estimates. 
Applications in finance and biomedicine show the relevance of this 
class of distributions, which is particularly useful for slightly skewed 
data. A practical by-result of the Lambert W framework: data can 
be "unskewed." 

The R package LambertW developed by the author is publicly avail- 
able (CRAN). 

1. Introduction. Exploratory data analysis regarding asymmetry in data 
is usually based on histograms and nonparametric density estimates, and 
statements such as "this data set looks almost Gaussian, but it is skewed to 
the right" or "these asset returns have heavy tails, but they are too skewed 
that a student-t would make sense" are fairly common. It is therefore natural 
to generalize symmetric distributions to allow for asymmetry. 

A prominent generalization is the skew-normal distribution [Azzalini 
(1985)], which includes the Gaussian as a special case. A skew-normal ran- 
dom variable (RV) is defined by having the probability density function 
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(p.d.f.) f(x) = 2(j>(x)$(ax), where <£(•) is the cumulative distribution func- 
tion (c.d.f.) of a standard Gaussian, and a 6 M. is the skew parameter. This 
approach to skewness has not only led to substantial research in the skew- 
normal case [Azzalini and Capitanio (1999), Arellano- Valle and Azzalini 
(2006)], but the same concept has also been used for student-t [Azzalini and 
Capitanio (2003)] and Cauchy distributions [Arnold and Beaver (2000), Be- 
hboodian, Jamalizadeh and Balakrishnan (2006)], among others. In all these 
cases, a parametric manipulation of the original symmetric p.d.f. introduces 
skewness. 

Notwithstanding the huge success of this approach to model skewed data, 
manipulating the p.d.f. to introduce skewness seems like putting the cart 
before the horse: densities are skewed, because the random variable is — not 
the other way around. Also, applied research starts with data, not with 
histograms. 

Motivated by this data-driven view on skewness, I propose a novel ap- 
proach to asymmetry: Lambert W x Fx distributions emerge naturally by 
modeling the observable RV Y as the output of a system S driven by random 
input X with c.d.f. Fx(x). Here S can either be a real chemical, physical, or 
biological system, or refer to any kind of mechanism in a broader sense. In 
statistical modeling such a system is simply represented by transformations 
of RVs. As there are no restrictions on Fx(x), this is a very general frame- 
work that can be analyzed in detail for a particularly chosen input c.d.f. 
Figure 1 illustrates the methodology. 

For instance, consider S being the stock market, where people buy and sell 
an asset according to its expected success in the future. Asset returns, that 
is, the percentage change in price, typically exhibit negative skewness and 
positive excess kurtosis — so-called stylized facts [Yan (2005), Cont (2001)]. 
The left panel of Figure 2 shows daily log-returns (in percent) y := {yt \ t = 
1, . . . , 1,413} of an equity fund investing in Latin America (LATAM 1 ). Also, 
these returns are clearly non-Gaussian given their excess kurtosis (1.201) 
and large negative skewness (—0.433) — see Table 1. The excess kurtosis is 
typically addressed by a student i-distribution, but here a Kolmogorov- 
Smirnov test still rejects y ~ t6.22 on a 5% level (even for the estimated v), 
as the empirical skewness is too large. Thus, to model the probabilistic 
properties of such data, asymmetric distributions must be used. 

Using Lambert W x F RVs to model the asymmetry in asset returns is 
perfectly suitable not only given empirical evidence of "almost student-f, 
but a little skewed data," but also by a more fundamental viewpoint. Price 
changes are commonly considered as the result of bad and good news hitting 



x Data from January 1, 2002 until May 31, 2007: R package fEcofin, data set 
equityFunds, series LATAM. 
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Fig. 1. Schematic view of the Lambert W approach to asymmetry: (left) an input/output 
system S transforms (solid arrows) input X ~ Fx to output Y ~ Lambert W x Fx 
and herewith introduces skewness; (right) inference about skewed data y — (yi, . . . ,2/jv)-' 
(1) unskew observed y to latent symmetric data x, (2) use methods of your choice (regres- 
sion, time series models, quantile estimation, hypothesis tests, etc.) for statistical inference 
based on x, and (3) convert results back to the "skewed world" ofy. 



the market: bad news, negative returns; good news, positive returns. The 
empirical evidence of negative skewness evokes the following question: why 
should news per se be negatively skewed? Or put in other words: do really 
bad things happen more often than really good things? 

In the Lambert W framework this news <->■ return relation is modeled 
under the assumption that the probability of getting negative news is about 
the same as of getting positive news, but typically people react far more 
drastically facing negative than positive ones. Thus, news X ~ Fx(x) are 
symmetrically distributed, the market S acts as an asymmetric filter, and 
the measurable/observable outcome is a skewed RV y/data y. 
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Fig. 2. Daily equity fund returns (LATAM): (left) observed returns y and estimated 
latent news xy MLE ; (right) Gaussian QQ plot and Lambert W xt QQ plot ofy. 
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Table 1 

LATAM returns y and back-transformed series x>; 
(top) summary statistics; (bottom) Shapiro-Wilk (SW) & 
Jarque-Bera (JB) normality tests, and Kolmogorov-Smirnov (KS) 
test for student-t with v degrees of freedom 



LATAM 


y 


TIGMM 


x "nviL,E 


Min 


-6.064 


-5.073 


-4.985 


Max 


5.660 


7.036 


7.268 


Mean 


0.121 


0.190 


0.198 


Median 


0.138 


0.138 


0.138 


St. dev. 


1.468 


1.456 


1.457 


Skewness 


-0.433 


0.000 


0.053 


Kurtosis 


1.201 


1.100 


1.159 


V 


6.220 


7.093 


7.048 


SW 


0.000 


0.000 


0.000 


JB 


0.000 


0.000 


0.000 


KS (tv) 


0.028 


0.088 


0.102 



Last, the right part of Figure 1 also illustrates a very pragmatic, yet use- 
ful way to exploit the Lambert W framework for (slightly) skewed data. 
If a certain statistical procedure or model assumes a symmetric — a Gaus- 
sian, as often is the case — distribution and no skewed implementation of 
this method is available, then instead of applying it to the skewed y, it is 
advisable to work with the "symmetrized" data x, make statistical inference 
about X based on x, and then transform the obtained results back to the 
"skewed world" of Y. Although this is only an approximation to the truth, at 
least this approach takes skewness into consideration instead of ignoring it. 

Section 2 defines Lambert W RVs and their basic properties are studied. 
Section 3 presents analytic expressions of the c.d.f. Gy(y) and p.d.f. gyiu), 
which are particular variants of their input counterparts. After studying 
Gaussian input in Section 4, various estimators for the parameter vector of 
Lambert W x F RVs are introduced in Section 5. Section 6 compares their 
finite sample properties and shows that additional estimation of the skew- 
ness parameter 7 does not affect the quality of other parameter estimates. 
This new class of distribution functions is particularly useful for data with 
slightly negative skewness, thus, Section 7 demonstrates its adequacy on an 
Australian athletes data set and on the LATAM return series. 

In particular, Section 7.2 shows that the input-output system (Figure 1) 
with student-t input X is a proper model for these returns. A detailed com- 
parison of quantile estimates, which are essential to get appropriate risk 
measures of an asset, confirms the aptness of Lambert W distributions (see 
Lambert W QQ plot in Figure 2). Empirical evidence for the significance of 
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conditional heteroskedastic time series models using Lambert W x F inno- 
vations concludes Section 7.2. 

Finally, Section 8 establishes a direct link of this new class of distributions 
to the existing statistics literature, noting that the square of a RV having 
Tukey's h distribution [Tukey (1977)] has a Lambert W x xl distribution. 

Computations, figures and simulations were realized with the open-source 
statistics package R [R Development Core Team (2008)]. Functions used in 
the analysis are available as the R package LambertW, which provides many 
other methods to perform Lambert W inference in practice. 

2. Lambert W random variables. The general notion of a system S 
with random input and output as shown in Figure 1 translates to a variable 
transformation in statistical terminology. 

Definition 2.1 (Noncentral, nonscaled Lambert W x F RV). Let U be 
a continuous RV with c.d.f. 

(2.1) Fu(u)=F(U <u), u£R, 
and p.d.f. fu(u). Then 

(2.2) Z :=Uexp(-fU), 7 G K, 

is a noncentral, nonscaled Lambert W x F RV with skewness parameter 7. 

If U is from a parametric family Fjj (u\ (3), where (3 parametrizes the Fjj , 
then Z is a noncentral, nonscaled Lambert W x F RV with parameter vector 

= 03,7)- 

The key of this family of RVs is 7, which can take any value on the 
real line. As exp(-) is always positive, U and Z have the same sign. For 
readability let H^{u) := uexp(ju). For 7 = transformation (2.2) reduces 
to the identity Z = U; thus, Z possesses the exact same properties as U. 
By continuity of H~(-), one can expect for 7 ^ but close, also Z but 
close. 

Transformation (2.2) indeed describes a system S with an asymmetry 
property: let U ~ Fjj(u) be a symmetric zero- mean RV, then Z is a skewed 
version of U — depending on the sign of 7. For 7 < negative U are amplified 
by the factor exp(7C7) > 1 and positive U are damped by < exp(7?7) < 1: 
Z is skewed to the left. For 7 > the same reasoning shows that Z is a pos- 
itively. 

The noncentral moments E(Z n ) equal 



(2.3) 



^(n) := J u n e^ n fu(u)du. 
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If the moment-generating function Mjj{t) := Ee exists for t = jn, then 

(2.3) can be rewritten to get a more tractable formula. As 

^-elUn = f Un \n e ~,Un 

interchanging differentiation and the integral sign yields 

(2.4) i> {n) =n- n ^Mu{ in ). 

If Mjj(t) does not exist (e.g., for student-t U), then (2.3) must be calcu- 
lated explicitly. 

2.1. Scale family input. In a typical input /output system S such as a mi- 
crophone/loudspeaker setting, the loudspeaker will be louder if speakers 
raise their voice. In this sense it is stable with respect to scaling: doubling 
the volume of the input doubles the volume of the loudspeakers — the signal 
is not affected in any other way. Viewing this system as a Lambert W x F 
RV system (where the signal is considered as a RV) , multiplying X by a fac- 
tor k, should — ceteris paribus — only affect the output Y by multiplying by k; 
other properties, such as skewness or kurtosis, should not be altered. 

Transformation (2.2), however, does not have this scaling property of U. 
Hence, to allow a comparable system characterization via 7 among different 
scalable data sets define a scaled Lambert W RV. 

Definition 2.2 (Scale Lambert W x F RV). Let U := X/a x be the 
unit- variance version of a continuous RV X from a scale family Fx(x \ (3), 
where j3 is the parameter (vector) of Fx and a x the standard deviation 
of X. Then 

(2.5) Y:={UeMlU)}a x = Xexp(jX/a x ), 7 e R, a x > 0, 
is a scale Lambert W x F RV with parameter vector 9 = 09,7). 

Transformation (2.5) is invariant to scaling of the input, for example, 
a different measurement unit for the input does not modify the asymmetry 
property of the system, but just scales the output accordingly. 

Here a x is a function of (3: for an exponentially distributed input X ~ 
exp(A), (3 = X and <J X (P) — A" 1 ; an input X having a Gamma distribution 
with shape a and rate j3 gives (3 = (a, j3) and a x ((3) = \fa/ j3. 

2.2. Location- scale family input. The focus of this work lies in introduc- 
ing skewness to symmetric RVs with support on (—00,00), such as a Gaus- 
sian or student-i. These distributions are not only scale, but also shift 
invariant, a property Lambert W x F distribution should also have for 
location-family input. However, transformation (2.5) is not shift-invariant. 
For example, consider a zero-mean and unit variance input RV U$ : f2 — > R, 
Uio := Uq + 10, and let 7 = 0.1. If Uq(uj) is close to 0, then the shifted ?7io(w) 
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will be close to 10. For the corresponding Zq(lo) and Z\q(uj) this does not 
hold: Zq{oj) is close to 0, but Z\q{uj) will not be shifted by 10, but lies close 
to 10 exp(l) « 27.183. 

Definition 2.3 (Location-scale Lambert W x F RV). Let 1 be a RV 
from a location-scale family with c.d.f. Fx(x \ (3) with mean jjl x and standard 
deviation a x ; again (5 parametrizes Fx- Let U = (X — fJ, x )/cr x be the zero- 
mean, unit- variance version of X. Then 



is a location-scale Lambert W x F RV with parameter vector 9 = (f3,j). 

As before, the parameter 7 regulates the closeness between X and its 
skewed version Y . 

For a full parametrization of a Lambert W x F distribution it is necessary 
to know 9 = 7); viewing (2.6) only as a transformation from X to Y, 
it is more natural — and in practice more useful — to only consider fj, x , a x 
and 7, ignoring the particular structure of X given its parametrization by j3. 
In order to distinguish these two cases in the remaining part of this work 
let r := {[i x ,o- x ,~f) G T = R x R + x R. Clearly, r can be computed from 9, 
t = t(9), but not necessarily vice- versa. 

For example, for a Gaussian X r = 9 since /J> X (P) = fi x and cr x (f3) = a x . 
In contrast, for a location-scale student-i input with /3 = (c, s, v) — where c 
is the location, s the scale and v the degrees of freedom parameter — r 7^ 9: 



Thus, below I use either 9 if the full parametrization is important or r if 
it is sufficient to consider (2.6) only as a transformation rather than a fully 
specified parametric distribution. 

Notation 2.4 (Lambert W x F RV). For simplicity I will refer to all Y 
in Definitions 2.1, 2.2 and 2.3 as a Lambert W x F RV. Which one of the 
three transformations (2.2), (2.5) or (2.6) is used to generate Y will be 
clear from the type of input X. For example, since a xl distribution does 
not have location or scale parameters, a Lambert W x x| RV refers to Y in 
Definition 2.1; the exponential distribution is a scale family, thus, a Lambert 
W x exp(A) RV Y is defined in Definition 2.2; and for Gaussian input X, 
the corresponding Lambert W x Gaussian Y refers to Definition 2. 3. 2 

2.3. Latent variables. So far attention has been drawn to Y and its prop- 
erties given X and 9 (or r). Now consider the inverse problem: given Y and 9 
(or only r), what does X look like? 



2 Although technically not correct, one can think of a scale Lambert W x F transfor- 
mation having r = (0, cr m ,"f), and a noncentral, nonscaled Lambert W x F transformation 
having r = (0, 1,7). This is especially useful for empirical work and implementation of the 
methods involving Lambert W x F RVs. 



(2.6) 



Y := {U exp{^U)}a x + n : 



7€R, 
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Corresponding event in U: 
W_,(z)<U<W (z) 

^MzT^^i W„(z) / 


/ 


Event :(Y-n x )/a x <z - 
z = -1/e 




z<0 


i i i i 





-6 -5 -4 -3 -2 -1 

u = W(z) 



Fig. 3. Lambert W function: transformation H(u) and the two inverse branches ofW(z) 
for z < 0: "principal branch (dashed curve) and nonprincipal branch (dotted curve). 



This is not only interesting for a latent variable interpretation of X, but 
the inverse of a transformation is essential to derive the c.d.f. of the trans- 
formed variable. Before analyzing transformation (2.6), consider the nonlin- 
ear transformation H : C — > C, u t— > uexpu =: z [Figure 3 shows H(u) only 
for uGl]. For positive u the function is bijective and resembles exp(u) very 
closely. For negative u, however, H{u) is quite different from exp(u): it takes 
on negative values, its minimum value equals — , and — most importantly — 
it is nonbijective. 

Although H{u) has no analytical inverse [Rosenlicht (1969)], its implicitly 
defined inverse function is well known in mathematics and physics. 

Definition 2.5 (Lambert W function). The many-valued function W{z) 
is the root of 

(2.7) W(z)e w W=z, z£C, 

and is commonly denoted as the Lambert W function. 

Generally the Lambert W function is defined for any z € C. Since Lambert 
W RVs are only defined for real- valued outcomes, in this work the domain 
and image of the Lambert W function is restricted to the reals. For z € 
[— oo,— 1/e) no real solution exists; for z £ [— 1/e, oo) W(z) is a real-valued 
function. If z G [— 1/e, 0), there are two real solutions: the principal branch 
Wo(z) > — 1 and the nonprincipal branch W-i(z) < — 1; for z £ [0, oo) only 
one real- valued solution exists, Wq{z) = W-i(z) (see Figure 3). 

For a detailed review including useful properties and functional identities 
of W(z) see Corless et al. (1996), Valluri, Jeffrey and Corless (2000) and the 
references therein. 
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Figure 3 also shows how skewness is introduced via transformation (2.6). 
Symmetric input X (x-axis) is mapped to asymmetric output Y (y-axis) 
due to the curvature of H(u). Analogously, mapping values from the y- 
axis to the x-axis "unskews" them. Figure 3 shows H^(u) : it i— >■ z for 7 = 1, 
thus, its inverse is Lambert's W function (W(z) :zh>m). The curvature of 
H^{u) = uexp^u) depends on the skewness parameter: for 7 = no cur- 
vature is present [Hq(u) = it]; higher 7 results in more curvature, and thus 
more skewness in Y. 

It can be easily verified that W^(Z) := W^Z)/^ is the inverse function 
of transformation (2.2). Hence, given Y and r, the unobservable input X 
can be recovered via 



For empirical work it is important to point out that (2.8) does not require 
specific knowledge about Fx or j3; fx x and a x (and 7) suffice. This will 
become especially useful for estimating the optimal inverse transformation — 
see Section 5.2. 

Remark 2.6 (Nonprincipal branch). The Lambert W function has two 
branches on the negative real line (Figure 3), so transformation (2.8) is not 
unique. For example, consider z = —0.25 and 7 = 1. The two real- valued so- 
lutions are W (-0.25) » -0.357 and W_i(-0.25) ~ -2.153. Assuming a sta- 
ble input /output system, only the principal branch makes sense 3 — denoted 
by W 7) o(-)- If the nonprincipal solution is required, Wy-i(-) will be used. 

The probability p_i that the observed value Z(ui) was indeed caused by 
the nonprincipal solution is at most ¥{U < — I/I7I}, since H~(u) changes its 
monotonicity at u = — 1/7. For Gaussian X and 7 = 0.1 — a very large value 
given empirical evidence — this probability equals 7.62 • 10 -24 . For an input 
with student t-distribution and v = 4 degrees of freedom p_i ~ 7.26 • 10 -5 . 
Hence, ignoring the nonprincipal root to obtain unique latent data should 
not matter too much in practice. 

Algorithm 1 describes the empirical version of (2.8). The so obtained 



is the input data x r generating the observed y and should have c.d.f. Fx{x). 
Here (•) does not stand for an estimate of r, but since W 7i o(-) ignores the 

3 The output is assumed to be similar to the input, but skewed. Therefore, the input 
values causing the output should lie close to them: observing z = —0.25, it is more rea- 
sonable to assume that this corresponds to the close input of Wo(— 0.25) w —0.357 rather 
than the very extreme W-i(— 0.25) fs —2.153. 



(2.8) 




(2.9) 




n = l,...,N, 
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Algorithm 1 Get input x T : function get . input (•) in the LambertW package. 
Input: data vector y; parameter vector r = (fi x ,a x ,j). 
Output: input vector x T . 
1: z = (y - (J, x )/a x . 

2: back-transform z via the principal branch to u = W 7) o(z). 
3: return x r = ua x + fi x . 



nonprincipal branch, Algorithm 1 need not return the "true" input data x — 
even if r is known. 4 For small 7, x T will most likely equal the true x for all n; 
for large 7 some yj's might be falsely assigned to the principal Xj's, although 
these yj's were actually caused by nonprincipal x^s. For an estimate r the 
notation will be used, which itself is an approximation to x r . 



3. Distribution and density function. For ease of notation and readabil- 
- — — , u := W 7j o(z), u-i := Wj-i(z), 

0~x 

UoVx+Hx, X-l ■= U-i<T x + fa. 

By definition, 

G Y (y) = P(y <y) = n{U*MlU)}o x + fi x <y) 

= P([/exp7C7 < z). 

The transformation H^(u) changes its monotonicity at u = — 1/7 and 
its inverse W^^z) at z = —l/(je). Consequently the event {Y < (>) y} 
[for 7 > (<) 0] has to be split up into separate events in U to derive the 
distribution of Y . 



ity let 

z := 

(3.1) 

x : = 



Theorem 3.1 (Distribution of a location-scale Y). The c.d.f. of a loca- 
tion-scale Lambert W x F RV Y equals (for 7 > 0) 

0, ify<-— + ^x, 

7e 

(3.2) G Y (y 1/3,7)= \ F x (x | 0) - F x (x^ \/3), if + ^ x <y < ^ 

je 

Fx(%a\P), ify>Hx- 
The case 7 < can be obtained analogously and for 7 = it is clear that 
G Y (y\P,0)=F x (y\(3). 



4 This only applies if p_i = F(U < — tK) > 0, as otherwise the back-transformation 
W-y(z) — W 1: o(z) is bijective. In particular, if U > — for example, for scale family input 
X > — then x T = x, not just an approximation. See also Corollary 3.3. 
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Proof. Follows by matching the events in Z with the corresponding 
events in U [Glen, Leemis and Drew (1997)]; see Figure 3. □ 

For z = — l/(7e) both branches of W(z) coincide, thus, no = U—\. There- 
fore, Fx{uqo x + fi x ) — Fx(u-\a x + fi x ) = at z = —l/(je), which implies 
continuity of Gy (y) at y = fi x — ^ ; the same reasoning shows continuity at 

'ye 

y = fix (z = 0). 

Theorem 3.2 (Density of a location-scale V). The p.d.f. of a location- 
scale Lambert W x F RV Y equals (for 7 > 0) 

( 0~ x 

0, ify< VHx, 

7e 



(3.3) g Y (y |/3, 7 ) 



f x (x I P) ■ W>( 1Z ) - f x (x-! I (3) • WL^-yz), 

if — - < y < tlx, 

7e 



Jx(x \P)-Wo('yz), if y> fix- 
Again, 7 < can be obtained analogously, and gy(y \ /3,0) = fx(y \ P)- 

PROOF. Using that ^W" 7 (z) = W'^z), the first derivative of Gy(y) 
with respect to y equals (3.3). The same arguments as for Gy(y) show 
that gy(y) is continuous at y = —a x j{^e) + fi x and y = fi x . □ 

In general, the support of Y depends on r = t{6) G T if 7 7^ 0. However, 
restricting r to the subspace S c := {r G T \ — cr x /(7e) + fi x = c} gives the 
same support [c, 00) for all r G C T [or (— 00, c] for 7 < 0]. Of particular 
empirical importance are 

(3.4) S :={9£&\ i i x =a x /( 1 e)} and S ±OQ = {9 G 6 | 7 = 0}. 

For (a scale family) X ~ Fx (a? | /3) taking values in [0, 00) and 7 > 0, the 
support of the corresponding (scale) Lambert W x F RV Y does not depend 
on r but always equals [0, 00). 

Corollary 3.3 (C.d.f. and p.d.f. of a scale Lambert W x F RV). If X 
is a nonnegative RV taking values in [0, 00) and 7 > 0, then the inverse 
transformation W^(y/o~ x ) is unique. Hence, the c.d.f. and p.d.f. of a scale 
Lambert W x F RV Y equal 



(3.5) Gy(y |/3,7) = F x [ W 7)0 { 3- )a x 



P 



and 

(3.6) Mv\P,7) = fx(Wi,o(jfj'<Tx\p) - W °(^£)- 
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(a) Lambert W x xl (b) Lambert W x exp(A) (c) Lambert W x J\f(n,a 2 ) 
with p = k = 3. with = A = 1. with /3 = (/x, or) = (0, 1). 



Fig. 4. Thep.d.f. andc.d.f. of (a) a "noncentral, nonscaled," (b) a "scale" and (c) a "Zo- 
cation-scale" Lambert W x 7?F Y for different degrees of skewness. 

PROOF. Follows by setting [i x = in (3.1), (3.2) and (3.3), and noting 
that the case u < does not exist since X > 0. □ 

For the c.d.f. and p.d.f. of a noncentral, nonscaled Lambert W x F RV Y 
(Definition 2.1) with X taking values in [0,oo) set a x = 1 in (3.5) and (3.6). 

Theorems 3.1 and 3.2 demonstrate the great flexibility of the Lambert W 
setting, since the closed form expressions for Gy{y \ /3,7) an d 9y{v \ A 7) 
hold for any well-defined input Fx(x \ (3) and fx(x \ /3), respectively. Thus, 
researchers can easily create Lambert W variants of their favorite distri- 
bution Fx, by simply plugging Fx and fx in (3.2) and (3.3). Figure 4 
shows the p.d.f. and c.d.f. of the three Lambert xF RVs discussed in No- 
tation 2.4 for four degrees of skewness, 7 = (0,0.1,0.2,0.4). For 7 = the 
output Y equals the input X, thus, also their p.d.f.s/c.d.f.s coincide (solid 
black lines). With increasing 7, the RV Y — and thus its distribution and 
density — become more and more skewed to the right (since 7 > 0). 

Although Lambert W RVs are defined by transformation (2.6), they can 
be also considered as a particular variant of an arbitrary Fx — independent of 
this transformation. Sometimes the input/output aspect might be more in- 
sightful (e.g., stock returns), whereas otherwise solely the generalized distri- 
bution suffices to analyze a given data set. Especially, if the latent variable X 
does not have any suitable interpretation (see BMI data in Section 7), one 
can concentrate on the probabilistic properties of Y, ignoring the input X. 
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3.1. Quantile function. Equation (3.2) and an inspection of Figure 3 
directly relate fj, x to Y. 

COROLLARY 3.4 (Median of Y). For a location-scale Lambert W RVY , 
^{Y < n x ) = P(X < fj, x ) for all 7GR. 
In particular, /i x equals the median ofY, if X is symmetric. 

Proof. The transformation H^{u) = uexp(7u) passes through (0,0) for 
all 7GI. Furthermore, exp(7u) > for all 7 and all u € M. Therefore, 

W(Y < fjk) = W(Z < 0) = P([/exp(7C7) < 0) = P(U < 0) 

= P(X<fi x ). 

For symmetric input ¥(X < fi x ) = i, therefore, \i x is the median of Y. □ 

Corollary 3.4 not only gives a meaningful interpretation of the parame- 
ter \x x for symmetric input, but the sample median of y also yields a robust 
estimate of fi x . 

In general, the a-quantile y a of Y satisfies 

a = F(Y < y a ) =w(lJexp(jU) < Va ~ ^ 

= P(J7exp(7C7) <z a ). 

For 7 > (7 < analogously) and z a > the function W 7i o(-) is bijective. 
Thus, 

F(Uexp( 7 U) < z a ) = ¥(U < W lfl {z a )) = P(X < W lfi {z a )a x + ^ x ) 

and by definition of the a-quantile of X , 

(3.7) x a = W 1)0 (z a )a x + fj, x & z a = u a e 1Ua , 

where u„ = Xa ~ flx . 

For z a < and 7 > 0, however, W 7 (-) is not bijective, thus, z a cannot be 
computed explicitly as in (3.7), but must be obtained by solving the implicit 
equation 

a = F x {W lfi (z a )a x + n x ) - F x {W 1 -i(z a )a x + fi x ). 
In either case, the a-quantile of Y equals y a = z a a x + \i x . 

4. Gaussian input. The results so far hold for any continuous input RV. 
To get a better insight consider Gaussian input U ~ M{^ U i °~u) as a special 
case; here f3 = (/i u ,<7 u ). Its moment generating function Mjj(t) equals 

E(e tu ) = e ^+t 2 /2al for all t e R _ 
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Therefore, noncentral moments of Z can be computed explicitly [see (2.4)] 

by 

^2 



d n ( 2 n l ai 

+ 7 " 



^H = n n ^exp(jn Mu 
In particular, 

o* = 2 - 2 (e 2 ^ +2 ^^ ((47a 2 + 2/, u ) 2 + 4a 2 )) - + 7 ^) 2 e 2 ^+^ 2 

= e 2^+2 7 V 5 ((27CT 2 + Mu) 2 + ^ _ ^ + 7(T 2 )2e 2 Wu+7 V 2 _ 



As already mentioned in Section 2, this is an unstable system, in the sense 
that a small perturbation in o~ u ) results in a completely different (fi z ,a z ) 
for 7/O. 

In contrast, the central moments of a location-scale Lambert W x Gaussian 
RV Y with input X ~ AA(/i x ,<T 2 ) have a much simpler and stable form 

(4.1) tht = fJh + (r x M(Ue^ u ) =fx x + a xl e^/\ 

since U = (X — [i x )/a x ~A/"(0, 1). Using (4.1), the kth central moment of Y 
can be expressed by the kth central moment of U e^ u , 



E(Y - fi y ) k = <J k x M{Ue 



7t/_ 7e 7 2 /2)fc_ 



In particular, 

(4.2) £J 2 =CT 2 e72((47 2 + 1)e72 _ 7 2 )) 

which only depends on the input variance and the skewness parameter 7. 

The main motive to introduce Lambert W RVs is to accurately model 
skewed data. The skewness coefficient of Y is defined as 71 (Y) := (E(Y — 
My) 3 ) / ' a y ■ Analogously, the kurtosis equals 72 (Y) := (E(Y — H y ) A )/o-y and 
measures the thickness of tails of Y . 

Lemma 4.1. For a location-scale Lambert W x Gaussian RV with input 



(4.3) 7i(7) =7 
and 

(4.4) 72(7) 



e^ (9 + 27 7 2 ) - e-y (3 + 12 7 2 ) + 5 7 2 

( e 72(l+4 7 2)_ 7 2)3/2 

e 6 ^ 2 (3 + 967 2 + 2567 4 ) - e 3 ^ 2 (3O7 2 + 6O7 4 - 967 s ) - 37 4 
(e7 2 (l + 4 7 2 ) - 7 2 ) 2 ' 
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Fig. 5. Pearson skewness (a) and kurtosis (c) coefficient for 7 6 [—1,1] o-nd its first 
order Taylor approximation (dashed line); (b) and (d): zoom to the interval [—0.15,0.15]. 



Proof. Dividing the third and fourth derivative of the moment gener- 
ating functions for a standard Gaussian U at t = with respect to 7 by n n 
gives 

97 2 /2 = 97e 97 2 /2 + 27 7 3 e 9 ^/ 2 = 9 7 e 9 ^ 2 (l + 3 7 2 ), 



3 3 (i7 3 
1 d 4 



, e 8 7 2 = 3e 8 7 2 + 967 2 e 87 2 + 2 56 1 4 e 8 ~* 2 = e 8 ^ 2 (3 + 96 7 2 + 256 7 4 ). 



4 4 d 7 4 

The rest follows by expanding E(Ue"f u - EUe' u ) 3 and E{Ue< u - EUe^ u ) 4 
via the binomial formula and using the above expressions. □ 



As expected, the skewness coefficient is an odd function in 7 with the 
same sign as 7. On the contrary, 72(7) is even. A first and second order 
Taylor approximation around 7 = yields 71(7) = 67 + 0(7 3 ) and 72(7) = 
3 + 6O7 2 + C(7 4 ), respectively. Although 7 can take any value in R, in 
practice, it rarely exceeds 0.15 in absolute value. In this interval the Tay- 
lor approximation is almost indistinguishable from the true function (Fig- 
ure 5). 

This first order approximation to 71(7) offers a rule of thumb 

(a <\ - 71 (y) 

(4.5) 7Taylor := g , 

which can be used as a starting value for better algorithms such as IGMM 
and MLE (see Section 5). 



Corollary 4.2. The skewness and kurtosis coefficient are unbounded 
for 7 — > ±00, that is, 

lim 71(7) = ±00 and lim 72(7) =00. 

7— >±oo 7— >±oo 
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Proof. Omitting — 7 2 in the denominator and 57 2 in the numerator of 
the skewness coefficient can be bounded from below 

9e 37 2 + 277 2 e 3 ^ 2 - 3e^ - 12 7 + 5 7 2 e 3 ^ 2 (9 + 27 7 2 ) - e^ 2 (3 + 12 7 2 ) 
( e 7 2 + 4 7 2 e^ 2 - 7 2 ) 3 / 2 ~ e ( 3 / 2 )^ 2 (1 + 4 7 2 ) 3 / 2 

_ (3/2)t2 9 + 27 7 2 

(l + 4 7 2)3/2 

2/2 3 + 12 7 2 

(l + 4 7 2)3/2- 

As the exponential function dominates rational functions, the first term 
tends to oo, whereas the second one goes to for 7 to oo. 

In case of the kurtosis coefficient, the term e 67 in the numerator domi- 
nates all other terms for large 7 and thus determines the asymptotic behavior 
of 7 2( 7 ) for 7 to ±oo. □ 

This result shows that the Lambert W x Gaussian distributions can be 
used to model a larger variety of skewed data than a skew-normal distribu- 
tion, since its skewness coefficient is restricted to the interval (—0.995,0.995) 
[Azzalini (1985)]. 

5. Parameter estimation. For a sample of N independent identically dis- 
tributed (i.i.d.) observations y = (yi, . . . ,ijn), which presumably originates 
from transformation (2.6), 9 = (/3, 7 ) has to be estimated from the data. 
In addition to the commonly used maximum likelihood estimator (MLE) 
for 9, I also present a method of moments estimator for r that builds on the 
input/output relation in Figure 1. 

5.1. Maximum likelihood estimation. The log- likelihood function in the 
i.i.d. case equals 

N 

(5.1) t(e\y)=Y, l °g9Y( yi \9), 

i=l 

where gy(- \ 9) is the p.d.f. of Y — see (3.3). The MLE is that 9 = (/3,7) 
which maximizes the log-likelihood 

Civile = argmax^(0 | y). 

9 

Since gyilli \ &) is a function of fx(%i \ P), the MLE depends on the spec- 
ification of the input density. In general, this multivariate, nonlinear opti- 
mization problem must be carried out by numerical methods, as the two 
branches of VF(z) for y < \i x do not allow any further simplification. 
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For (scale) Lambert W x Fx with support in (0, oo) and 7 > 0, however, 
9Y(y |/3, 7) = fx(W yfi (%) • 0* | /3) • W£( 7 £) (Corollary 3.3). Thus, (5.1) 
can be rewritten as 

N , . 

(5.2) £(/3, 7 I y) = lip I x (0)ff:ci7) ) + ^ log ^ 7^ , 

1 V ^ / 



where 



(5.3) ^|x( 0) «T B)7 )) = X)log/x(w r 7l o(^) - ^1^) 

is the log-likelihood of the back-transformed data X( 0jCr:I . j7 ) = W 7i o(^-) • ffx 

[no (•) since the inverse is unique in this case]. Note that ^(7^-) only 
depends on cr x ((3) (and 7), but not necessarily on every coordinate of (3. 

The equivalence (5.2) shows the relation between the exact MLE (/3,7) 
based on y and the approximate MLE (3 based on x^ ^ xn y. if we would 
know a x and 7 beforehand, then we could just back-transform y to X( 0j(7:cj7 ) 

and compute /3 MLE based on X( 0j(T:C)7 ) [maximize (5.3)]; however, in practice, 
a x and 7 have to be estimated from y and this uncertainty enters the log- 
likelihood (5.2) by the additional term Y^i=i l°gWo(7jr)- 

For z > it can be easily shown that W'{z) = z ^\y\z)) > as wen as 
W'(z) < 1 since W'(0) = 1 and W"(z) = -W'(z) exp(-W(z)) -^^L < 0. 
Hence, Y17=i^°S^o(7 ^) < for 7 > and can be thought of as a penalty 
for transforming y to the "nicer" x^g^) with estimated parameters: the 
larger 7, the bigger the penalty on the log-likelihood £(@ \ x^ (Txrf \) of the 
"nice" back-transformed data, since W"(z) < 0. 



Parameter- dependent support. For location-scale Lambert W x F RVs 
the support of gyiu) depends on r = t(8) and therefore violates a crucial 
assumption of most results related to (asymptotic) properties of the MLE. 
Only for 7 = the support of gyiv) = fx{y) does not depend on 9. For 
X ~ A/"(0, 1) it can be shown that the Fisher information matrix In{i) = 
—E,(£^£(6 I y)) = 8N. Hence, for the symmetric Gaussian case V^7mle — > 
Af(0, |). Simulations in Section 6 confirm this asymptotic result and suggest 

that also for the general Gaussian case $mle is well behaved, that is, it is 
V^/V-consistent and asymptotically efficient. 

A theoretical analysis of the asymptotic behavior of the MLE for 7 7^ is 
beyond the scope of this study, but simulations show that also for parameter 
dependent support #mle is an unbiased estimator with root mean square 
errors comparable to the 7 = case. 
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5.2. Iterative generalized method of moments (IGMM). A disadvantage 
of the MLE is the mandatory a-priori specification of the input distribution. 
In practice, however, it is rarely known what kind of distribution is a good 
fit to the data, even more so if the data is transformed via a nonlinear trans- 
formation. Thus, here I present an iterative method to estimate the optimal 
inverse-transformation (2.8) by estimating r directly, instead of estimating 9 
and then computing t{6). This method builds on the input/output aspect 
and only relies upon the specification of the theoretical skewness of X . 

The proposed estimator for r works as follows (see below for a more 
detailed discussion): 

(1) set starting values = tq. Set k = 0; 

(2) assume and ax are known and estimate 7 from = y ~r k x ) to 
obtain ^ k+1 ^ (Algorithm 2); 

(3) assume ^ k+1 ^ is known and get estimates /J, x k+1 ^ and a x k+l ^ from the 
back-transformed data x, (*,) a w T (fc+i)\ (Algorithm 3). Set k = k + 1; 

(4) iterate between (5.2) and (5.2) until convergence of the sequence . 

For a moment assume that fi x and a x are known and only 7 has to be 
estimated. S ince fx x and o~ x are known, we can consider z — — — A natural 
choice for 7 is the one that results in back-transformed data u 7 = W^(z) 
with sample skewness equal to the theoretical skewness of U, which equals 
the theoretical skewness of X. Formally, 

(5.4) 7gmm = argmin||7i(X) - %(vL)\\, 

7 

where || • || is a proper norm in M, for example, ||s|| = s 2 or ||s|| = \s\. 

Discussion of Algorithm 2. For example, let y be positively skewed data, 
7i(y) > 0) an d the input x causing the observed y is assumed/known to be 
symmetric, thus, 71 (AT) = 0. By the nature of transformation H^(u), the 



Algorithm 2 Find optimal 7: function gamma_GMM ( • ) in the LambertW pack- 
age 

Input: standardized data vector z; theoretical skewness 71(A). 
Output: 7gmm as in (5.4). 
1: Compute lower and upper bound for 7: lb = — cxp (i)max(z) anc ^ u ^ = 

cxp(l) min(z) ' 

2: 7gmm = argmin 7 ||7i(u) — 7i(A)|| where u = VF 7 (z) subject to 7 £ 

[lb,ub\. 
3: return 7gmm- 
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skewness parameter 7 must be also positive and the Taylor approxima- 
tion of 71(7) for Gaussian input [see (4.5)] gives a good initial estimate 
7o = 7i(y)/6 > 0. In the same way as the mapping u 1— > uexp^u) intro- 
duces skewness, the inverse transformation W-y(z) results in less positively 
skewed u 7 due to the curvature in W 7 (-) (see Figure 3). As the initial guess 70 
rarely gives exactly symmetric input, Algorithm 2 searches for a 7 such that 
the empirical skewness of u 7 is as close as possible to the "true" skew- 
ness 71 (AT). 

There are natural bounds for 7 to guarantee the observability of y, for 
example, a 7 too large makes large negative observations in y impossible 
(due to the minimum at z = — 1/e; see Figure 3). However, since y has 
actually been observed, the search space for 7 must be limited to the in- 
terval O z := \ ttt~ — rr, / J . If there exists a 7 G O z such that 

* L exp(lj max(z) ' exp(l) min(z) 1 1 * 

7i(u 7 ) = 71(A), then Algorithm 2 will return 7 = 7 due to the monotoni- 
cally increasing curvature of i^ 7 (n) and W 1 {z) respectively; if there is no 
such 7 G O z , then Algorithm 2 returns either the lower or upper bound of O z , 
depending on whether z is negatively or positively skewed. 

This univariate minimization problem with constraints can be carried out 
by standard optimization algorithms. 

In practice, \x x and a x are rarely known but also have to be estimated from 
the data. As y is shifted and scaled ahead of the back-transformation W 7i o(-) 5 
the initial choice of \i x and a x affects the optimal choice of 7. Therefore, the 
optimal triple (fi x ,& x ,j) must be obtained iteratively. 

Discussion of Algorithm 3. Algorithm 3 first computes z^ = (y — fi x k ^)/ 
a x using fix and a x from the previous step. This normalized output can 
then be passed to Algorithm 2 to obtain an updated 7( fc+1 ) := 7gmm- Us- 
ing this new j( k+1 \ one can back-transform to the presumably zero- 
mean, unit- variance input u( fc+1 ) = W^(k+i)(z^). Herewith we can obtain 

a better approximation to the "true" latent x by x^ fc+1 ) = u( fc+1 ) o~ x k ^ + fi x k \ 
However, 7( fc+1 ) — and therefore x^ fc+1 ) — has been obtained using fix and 

(k) 

o~ x which are not necessarily the most accurate estimates in light of the 
updated approximation x ( k ) (k) (fe+ i )v Thus, Algorithm 3 computes new 

estimates fi x k+1 ^ and a x k+l ^ by the sample mean and standard deviation of 
x\ (fe) (fe) ( fe+ i)\> and starts another iteration by passing the updated nor- 

malized output z( fc+1 ) = y ~f^ +1 ) to Algorithm 2 to obtain a new 7(^+2) . 

X 

The algorithm returns the optimal tigmm once the estimated parameter 

triple does not change anymore from one iteration to the next, that is, if 
|| T (fc)_ r (*+i)|| <toL 

A great advantage of the IGMM estimator is that it does not require 
any further specification of the input except its skewness. For example, no 
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Algorithm 3 Iterative generalized method of moments: function IGMM(-) in 
the LambertW package. 

Input: data vector y; tolerance level tol; theoretical skewness 71 (X). 
Output: IGMM parameter estimate tigmm = {pxi^xil)- 
1: Set r(" x ) = (0,0,0). 

2: Starting values: = (/ix , <ri ,7^), where fi x °^ = y and = &y are 

the sample median and standard deviation of y, respectively. 7^ = 
7i(y)-7iP0 ^ gee (4 5) for detailg 

3: k = 0. 

4: while ||r( fc ) -r( fc-1 )|| > tol do 

5: z( fc ) = (y-/i fc) )M fc) , 

6: Pass z( fc ) to Algorithm 2 — ^-y( fe + 1 ) ) 

7: back-transform to u( fc+1 ) = WL(t|i)(zW); compute x( fc+1 ) = 

8: update parameters: /ii fc+1 ^ =Xfc + i and =CT Xfc+1 , 
9: T ( fc+ l) = ( ^ + l) )fJ (^l) )7 ( fc+ l) ); 

10: k = k + l. 

11: return tigmm=t^ ■ 



matter if the input is normally, student-i, Laplace or uniformly distributed, 
the IGMM estimator finds a r that gives symmetric xp independent of the 
particular choice of (symmetric) Fx(-)- 

A disadvantage of IGMM from a probabilistic point of view is its deter- 
mination. In general, Algorithm 3 will lead to back-transformed data with 
sample skewness identical to 71 (A") and so no stochastic element remains in 
the nature of the estimator. 5 Note that IGMM does not provide an estimate 
of (3 (except for Gaussian input); if necessary, an estimate of (3 must be 
obtained in a separate step, for example, by estimating (3 from the back- 
transformed data x? . However, in general, /3mle estimated only from x^ 
is (slightly) different from /3 MLE using Lambert W MLE on the original 
data y: in the first case r is assumed to be known and fixed, whereas in the 
second case (3 and r are estimated jointly [see (5.2)]. 

The underlying input data x = (27, . . . , xn) can be approximated via Al- 
gorithm 1 using tigmm- The so obtained x^ GMM may then be used to check 
if X has characteristics of a known parametric distribution Fx(x \ (3), and 
thus is an easy, but heuristic check if y follows a particular Lambert W x Fx 
distribution. However, such a test can only serve as a rule of thumb for var- 



5 If 71 (X) depends on one or more parameters of the distribution of X (e.g., Gamma), 
then the IGMM algorithm must be adapted to this very problem. 
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ious reasons: (i) f /t, thus tests are too optimistic as Xf will have "nicer" 
properties regarding Fx than the true x would have; (ii) ignoring the non- 
principal branch alters the sample distribution of the input — putting no 
observations to the far left (or right): not so much of a problem for small 7, 
the distribution can be truncated considerably for large 7. For Gaussian in- 
put various tests are available [Jarque-Bera, Shapiro-Wilk, among others; 
see Thode (2002)], for other distributions a Kolmogorov-Smirnov test can 
be used. 6 

5.2.1. Gaussian IGMM. For Gaussian X the system of equations 

(5.5) fiyi'y) =fi x + a x ^e 1 ' 2/2 , 

(5.6) CT 2 (7)=a 2 e72((47 2 + 1)e72 _ 72) 

has a unique solution for (fJ, x ,&x)- Given 7igmm and the sample moments ~fi y 
and a y , the input parameters fi x and a x can be obtained by 

2- v_ 3 



(5-7) £ x (7igmm) 

e7lGMM (( 4 7fGMM + l)e 7 «3MM - 7igmm) 

(5.8) ,M7igmm) = ~P y - 3 : ^(7iGMM)7iGMMe ::i;i2GMM/2 . 
Hence, line 8 of Algorithm 3 can be altered to 

(5.9) 8b: ^ x k+l) = jl x (jly,ay,<yk+i) and cj x k+r) = a x {a y , 7/0+1), given 
by (5.7) and (5.8). 

Even though this simplification would lead to a faster estimation of r, it 
is mostly of theoretical interest, as it cannot be guaranteed that X indeed 
is Gaussian; the more general Algorithm 3 should be used in practice. 7 

6. Simulations. Although the c.d.f., p.d.f. and moments of a Lambert W 
RVs are nontrivial expressions, their simulation is straightforward (Algo- 
rithm 4). 

This section explores the finite-sample properties of estimators for 6 = 
(^x, 7) under Gaussian input X ~ A/"(/i x , cr 2 ). 8 In particular, conventional 
Gaussian MLE (estimation of fi y and a y only; 7 = 0), IGMM and Lambert 
W x Gaussian MLE, and — for a skew competitor — the skew-normal MLE 9 



6 If the data does not represent an independent sample (as usual for financial data), 
then critical values of several test statistics need not be valid anymore and adapted tests 
should be used [see Weiss (1978)]. 

7 A11 numerical estimates tigmm reported in Section 6 were obtained using the more 
general algorithm with line 8, not 8b. 

8 For the special case of Gaussian input r = 6, thus, IGMM estimates tigmm = #igmm 
can be compared directly to #mle- 

9 Function sn.mle in the R package sn. 
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Algorithm 4 Random sample generation: function rLambertW(-) in the 
LambertW package. 

Input: number of observations n; parameter vector (3; specification of the 

input distribution Fx(x | /3); skewness parameter 7. 
Output: random sample (yi, . . . ,y n ) of a Lambert W x F RV. 
1: Simulate n samples x = (x±, . . . , x n ) ~ Fx(x \ (3). 

2: Compute Hx(fl) and cr x (/3) given the type of Lambert W x F distribution 
(noncentral, nonscale; scale; location-scale). 

3: U=(x- / U z (/3))/ C T x .(/3). 

4: z = uexp(7u). 

5: return y = za x (/3) + /i x (/3). 



are studied. Whereas a comparison of accuracy and efficiency in 7 does not 
make sense, it is meaningful to analyze j2 y and a y of skew-normal versus 
Lambert W x Gaussian MLE. 

Scenarios. Each estimator is applied to 3 kinds of simulated data sets 
for 4 different sample sizes of N = 50, 100, 250 and 1,000: 

7 = 0: Data is sampled from a symmetric RV Y = X ~ AA(0,1). Does 
additional estimation of 7 affect the properties of fi y or a y ? 

7 = —0.05: A typical value for financial data, such as the LATAM returns 
introduced in Section 1. 

7 = 0.3: This large value reveals the importance of the two branches of 
the Lambert W function. How does the skew-normal MLE handle extremely 
skewed data [7(0.3) = 1.9397]? 

Simulations are based on n = 1,000 replications. The input mean fi x and 
standard deviation a x are chosen such that the observed RV has ^(7) = 
and 0^(7) = 1 for all 7. These functional relations can be obtained by (5.7) 
and (5.8). For IGMM the tolerance level was set to tol = 10 -6 and the 
Euclidean norm was used. 

Remark 6.1. The Gaussian and skew- normal MLE estimate the mean 
and standard deviation of Y . Both Lambert W methods estimate the mean 
and standard deviation of the latent variable X plus the skewness param- 
eter 7. Thus, for a meaningful comparison the implied estimates a y {11 x ,^f) 
and jl y (ji x ,d x ^) given by (5.5) and (5.6) are reported below. 

6.1. Symmetric data: 7 = 0. This parameter choice investigates if impos- 
ing the Lambert W framework, even though its use is superfluous, causes 
a quality loss in the estimation. Furthermore, critical values can be obtained 
for the finite sample behavior of 7 under the null hypothesis of a symmetric 
distribution. 
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Table 2 





Bias and RMSE 


of 8 for 7 = 


and X r- 


-JV(0,1) 








i\ 




Bias 






RMSE • VN 


7 = 


fly = 


(Ty — 1 


7 = 


/J.y — 


(Ty — 1 


Gaussian ML 


50 


0.0000 


0.0054 


-0.0175 


0.0000 


0.9943 


0.7053 




100 


0.0000 


0.0016 


-0.0084 


0.0000 


0.9812 


0.7410 




250 


0.0000 


-0.0029 


-0.0009 


0.0000 


0.9997 


0.6917 




1,000 


0.0000 


0.0005 


-0.0013 


0.0000 


0.9788 


0.7105 


IGMM 


50 


-0.0015 


0.0054 


-0.0060 


0.4567 


0.9945 


0.7059 




100 


-0.0012 


0.0015 


-0.0030 


0.4368 


0.9813 


0.7405 




250 


0.0001 


-0.0017 


-0.0009 


0.4210 


0.9997 


0.6919 




1,000 


0.0003 


0.0005 


-0.0008 


0.4014 


0.9788 


0.7102 


Lambert W ML 


50 


-0.0013 


0.0054 


-0.0126 


0.5144 


0.9951 


0.7210 




100 


-0.0013 


0.0016 


-0.0072 


0.4670 


0.9813 


0.7407 




250 


0.0002 


-0.0017 


-0.0027 


0.4333 


0.9997 


0.6922 




1,000 


0.0003 


0.0005 


-0.0012 


0.4039 


0.9788 


0.7106 


Skew-normal ML 


50 


NA 


0.0052 


-0.0135 


NA 


0.9928 


0.7149 




100 


NA 


0.0015 


-0.0073 


NA 


0.9821 


0.7409 




250 


NA 


-0.0018 


-0.0027 


NA 


1.0004 


0.6925 




1,000 


NA 


0.0000 


-0.0013 


NA 


0.9788 


0.7105 



Table 2 displays the bias and root mean square error (RMSE) of 9. Not 
only are all estimators unbiased, but they also have essentially equal RMSE 
for Jiy and a y . It is well known that the Gaussian MLE of o~ x is only asymp- 
totically unbiased, but for small samples it underestimates the standard 
deviation, whereas a method of moments estimator such as IGMM does 
not have that problem (see N = 50). For 7 the IGMM estimator has slightly 
smaller RMSE than MLE for small iV; for large N the difference disappears. 
This can also be explained by an only asymptotically unbiased MLE for o~ x , 
and the functional relation (4.2) of 7, a x and a y . 

Overall, estimating 7 has no effect on the quality of the remaining param- 
eter estimates, if the data comes from a truly (symmetric) Gaussian distribu- 
tion. A Shapiro Wilk Gaussianity test on the n = 1,000 estimates of 7igmm 
and 7mle gives p-values of 68.91% and 68.25%, respectively (N = 1,000), 
and thus confirms the asymptotic normality of 7 as stated in Section 5.1. 

6.2. Slightly skewed data: 7 = —0.05. This choice of 7 is motivated by 
real world data — in particular, asset returns typically exhibit slightly nega- 
tive skewness [71 (-0.05) = -0.30063]. 

Table 3 presents the effect of ignoring small asymmetry in data. Gaussian 
MLE is by definition biased for 7, but Jl y and a y are still good estimates. 
Neither IGMM nor Lambert W MLE gives biased 6, but the RMSE of a y in- 
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Table 3 

Bias and RMSE of 6 for 7 = -0.05 and X ~ N^Qy), al{i)) 



Bias RMSE • t/N 





N 


7 = -0.05 


fJ.y—0 


tTy — 1 


7 = -0.05 


fJ-v = 


<Ty — l 


Gaussian ML 


50 


0.0500 


0.0057 


-0.0176 


0.3536 


0.9952 


0.7350 




100 


0.0500 


0.0016 


-0.0083 


0.5000 


0.9826 


0.7741 




250 


0.0500 


-0.0029 


-0.0009 


0.7906 


0.9981 


0.7095 




1,000 


0.0500 


0.0005 


-0.0014 


1.5811 


0.9781 


0.7281 


IGMM 


50 


-0.0008 


0.0046 


-0.0057 


0.4582 


0.9954 


0.7410 




100 


-0.0008 


0.0011 


-0.0026 


0.4389 


0.9828 


0.7753 




250 


0.0002 


-0.0019 


-0.0007 


0.4189 


0.9982 


0.7102 




1,000 


0.0003 


0.0005 


-0.0009 


0.3986 


0.9780 


0.7276 


Lambert W ML 


50 


-0.0043 


0.0052 


-0.0116 


0.5113 


0.9961 


0.7570 




100 


-0.0029 


0.0015 


-0.0062 


0.4701 


0.9829 


0.7802 




250 


-0.0006 


-0.0017 


-0.0024 


0.4282 


0.9981 


0.7114 




1,000 


0.0001 


0.0005 


-0.0013 


0.3992 


0.9781 


0.7284 


Skew-normal ML 


50 


NA 


0.0073 


-0.0136 


NA 


1.0011 


0.7490 




100 


NA 


0.0026 


-0.0067 


NA 


0.9834 


0.7811 




250 


NA 


-0.0014 


-0.0025 


NA 


0.9990 


0.7109 




1,000 


NA 


0.0000 


-0.0012 


NA 


0.9796 


0.7281 



creases for all estimators and all sample sizes. Again IGMM presents smaller 
RMSE for 7 than MLE for small N, but not for large N — for the same rea- 
son as in the 7 = case. Notably, the skew-normal MLE for fj, y and a y is 
also unbiased and has the same RMSE as the Lambert W and Gaussian 
competitors, even though the true distribution is a Lambert W x Gaussian, 
not a skew-normal. 

6.3. Extremely skewed data: 7 = 0.3. In this case, the Lambert W MLE 
should work better than the skew-normal MLE, since the skewness coefficient 
7i(0.3) = 1.9397 lies outside the theoretically possible values of skew-normal 
distributions. Furthermore, the nonprincipal branch of the Lambert W func- 
tion becomes more important as p-\ ~ 4.29 • 10~ 4 , so the Lambert W MLE 
should also outperform IGMM, which ignores the nonprincipal solution. 

Only the skew-normal MLE fails to provide accurate estimates of location 
and scale for heavily skewed data sets; all other estimators are practically 
unbiased (Table 4). The RMSE for a y almost doubled compared to the 
symmetric case, and for Gaussian as well as skew-normal MLE it is increasing 
with sample size instead of decreasing. While 7igmm has less bias, 7mle has 
a much smaller RMSE: not ignoring the nonprincipal branch more than 
compensates the finite sample bias in a x . Surprisingly, the RMSE for 7 has 
diminished by about 35% over all sample sizes compared to the symmetric 
case. 
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Table 4 

Bias and RMSE of 9 for 7 = 0.3 and X ~ N(/j, x (j), 0^(7)) 



Bias RMSE • ViV 





N 


7 = 0.3 


Hy=0 




7 = 0.3 




(Ty — 1 


Gaussian ML 


50 


-0.3000 


0.0029 


— 0.0336 


2.1213 


0.9851 


1.2941 




100 


-0.3000 


0.0006 


— 0.0194 


3.0000 


0.9863 


1.3957 




250 


-0.3000 


-0.0076 


-0.0056 


4.7434 


1.0045 


1.4499 




1,000 


-0.3000 


0.0002 


-0.0013 


9.4868 


0.9883 


1.4954 


IGMM 


50 


-0.0076 


0.0081 


-0.0057 


0.4374 


0.9917 


1.2417 




100 


-0.0055 


0.0028 


-0.0063 


0.4005 


0.9853 


1.2440 




250 


-0.0032 


-0.0012 


-0.0056 


0.3647 


1.0009 


1.2204 




1,000 


-0.0026 


-0.0003 


-0.0049 


0.3197 


0.9820 


1.1992 


Lambert W ML 


50 


0.0180 


0.0221 


0.0266 


0.3844 


1.0152 


1.2385 




100 


0.0115 


0.0131 


0.0168 


0.3241 


1.0095 


1.2218 




250 


0.0055 


0.0053 


0.0054 


0.2747 


1.0102 


1.1535 




1,000 


0.0000 


0.0021 


-0.0021 


0.2349 


0.9818 


1.1383 


Skew-normal ML 


50 


NA 


0.0695 


-0.0938 


NA 


1.3638 


1.1965 




100 


NA 


0.0558 


-0.0834 


NA 


1.3508 


1.3182 




250 


NA 


0.0520 


-0.0748 


NA 


1.4865 


1.5577 




1,000 


NA 


0.0560 


-0.0704 


NA 


2.1588 


2.4585 



Discussion. Estimation of \i y is unaffected by the value of 7; the quality 
of a y , however, depends on 7: the larger 7, the greater the RMSE of a y . 
For 7 = the Lambert W methods perform equally well as Gaussian MLE, 
whereas for nonzero 7 Gaussian and — to some extent — skew-normal MLE 
have inferior qualities compared to the Lambert W alternatives. In particu- 
lar, the RMSE for a y increases with sample size. 

Hence, there is no gain restricting analysis to the (symmetric) Gaussian 
case, as the Lambert W framework extends this distribution to a broader 
class, without losing the good properties of Gaussian MLE. For little asym- 
metry in the data, both the Lambert W and the skew-normal approach 
give accurate and precise estimates of location, scale and skewness. Yet for 
heavily skewed data (skewness greater than 0.995 in absolute value), the 
skew-normal framework fails not only in theory, but also in practice to pro- 
vide a good approximation. 

Table 5 shows the average number of iterations the IGMM algorithm 
needed to converge: for increasing sample size it needs less iterations — 
sample moments can be estimated more accurately; more iterations are 
needed for larger 7 — as the starting value for 7 is based on the Taylor 
expansion around 7 = and moving away from the origin makes the initial 
estimate 7W :=7Tayior less precise. 

A closer look at the two sub-tables (top and bottom) shows that find- 
ing the optimal 7 (Algorithm 2) becomes much more difficult for increas- 
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Table 5 

Average number of iterations (tol = ): (top) IGMM Algorithm 3 including the 
iterations in Algorithm 2; (bottom) IGMM only (not counting iterations in Algorithm 2). 
(left) Gaussian input; (right) student-t input with t/ = 4 degrees of freedom. Based on 

n = 1,000 replications 



7 



N 





-0.05 


0.3 


0,1/ = 4 


-0.05,1/ = 4 


0.3,1/ = 4 


50 


8.39 


10.24 


34.65 


15.82 


16.76 


26.91 


100 


6.05 


8.16 


35.01 


17.52 


19.12 


21.45 


250 


4.37 


6.45 


27.51 


17.83 


21.34 


13.23 


1,000 


3.58 


4.96 


18.43 


17.20 


24.58 


6.49 


50 


4.43 


4.60 


6.24 


4.56 


4.64 


6.23 


100 


4.10 


4.44 


6.44 


4.46 


4.44 


5.78 


250 


3.90 


4.26 


5.92 


4.15 


4.19 


5.45 


1,000 


3.58 


4.11 


5.91 


3.78 


4.04 


5.36 



ing 7 and sample size ./V than finding the optimal \i x and a x given the 
optimal 7gmm (Algorithm 3). For 7 = and large N there is almost no 
difference between the total number of iterations (top) and the number of 
iterations in Algorithm 3 only (bottom). For large 7, however, the total num- 
ber of iterations is approximately 5 times as large. The right panel shows 
the values for simulations of a Lambert W x t RV with v = 4 degrees of 
freedom. For small 7, finding 7gmm takes much longer than for Gaussian 
input; surprisingly, for large 7 convergence is reached faster. This is proba- 
bly a result of the constrained optimization: due to more extreme values for 
a i-distribution, Algorithm 2 often returns one of the two boundary values 
for 7gmm without even starting the optimization process. 

Given its good empirical properties, fairly general assumptions about the 
input variable X, and its fast computation time, the IGMM algorithm can 
be used as a quick Lambert W check. For a particular Lambert W x F 
distribution, the Lambert W x F MLE gives more accurate results, especially 
for heavily skewed data. 

7. Applications. This section demonstrates the usefulness of the pre- 
sented methodology on real world data. In the first example I analyze parts 
of the Australian Athletes data set 10 which can be typically found in the 
literature on modeling skewed data [Genton (2005), Azzalini and Dalla-Valle 
(1996)]. 

The second example reexamines the LATAM returns introduced in Sec- 
tion 1. A Lambert W x t-distribution is found to give an appropriate fit, 



R package LambertW, data set AA. 



LAMBERT W RANDOM VARIABLES 



27 




1 1 1 i 1 r ° i 1 1 1 1 

20 40 60 80 100 15 20 25 30 35 



Female Athletes kg/m 2 

Fig. 6. Australian Athletes BMI: (left) observed data y (dots) and back-transformed 
data Xf MLE (triangles); (right) histogram plus density estimates. 

both for the raw data as well as the standardized residuals of an auto- 
regressive conditional heteroskedastic time series model (see Section 7.2.1 
for details). In particular, a comparison of risk estimators (Value at Risk) 
demonstrates the suitability of the Lambert W x F distributions to model 
financial data. 

7.1. BMI of Australian athletes. Figure 6 shows the Body Mass Index 
(BMI) of 100 female Australian athletes (dots) and Table 6 lists several 
statistical properties (column 1). Although the data appear fairly Gaus- 
sian, its large positive skewness makes both tests reject normality on a 5% 
level. 

After 5 iterations tigmm = (21.735,2.570,0.099), which implies ft y = 
21.992, o y = 2.633, and 7i(7igmm) = 0.601, assuming Gaussian input. 

Table 6 

BMI (y ) and back-transformed data xp IGMM 
and Xf MLE : (top) summary statistics; (bottom) 
Shapiro- Wilk (SW), Jarque-Bera (JB) normality tests 



BMI 


y 


X nGMM 


X t(«MLe) 


Min 


16.750 


15.356 


15.406 


Max 


31.930 


29.335 


29.384 


Mean 


21.989 


21.735 


21.742 


Median 


21.815 


21.815 


21.815 


St. dev. 


2.640 


2.570 


2.569 


Skewness 


0.683 


0.000 


0.017 


Kurtosis 


1.093 


0.186 


0.187 


SW 


0.035 


0.958 


0.959 


.IB 


0.001 


0.877 


0.874 
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Table 7 

Lambert W x Gaussian MLE for the BMI data 





Estimate 


Std. error 


t value 


Pr(>|t|) 


fJ-x 


21.742 


0.274 


79.494 


0.000 


(7x 


2.556 


0.188 


13.618 


0.000 


7 


0.096 


0.039 


2.481 


0.013 



The BMI data set consists of exactly n = 100 i.i.d. samples and Table 2 
lists finite sample properties of 7igmm for this case. 11 Thus, if Y = BMI was 
Gaussian, then 

(7.1) V1007IGMM y 

v ; 0.4368 v ' 

Plugging 71GMM =0.099 into (7.1) gives 2.279 and a corresponding p-va- 
lue of 0.0113. Thus, 7igmm is significant on a 5% level, yielding an indeed 
positively skewed distribution for the BMI data y. 

As both tests cannot reject Gaussianity for x^ IGMM , a Lambert W x 
Gaussian approach seems reasonable. Table 7 shows that all estimates are 
highly significant, where standard errors are obtained by numerical eval- 
uation of the Hessian at the optimum. As not one single test can reject 
normality of xp MLE (triangles in Figure 6), an adequate model to capture 
the statistical properties of the BMI data is 

BMI = (Ue om9U )2.556 + 21.742, U = X ~ 2L742 „ ^(0, 1). 

2.556 

For #mle the support of BMI lies in the half-open interval [11.967, 00). As 
all observations lie within these boundaries, #mle is indeed a (local) max- 
imum. Figure 6 shows the closeness of the Lambert W x Gaussian density 
to the histogram and kernel density estimate, whereas the best Gaussian is 
apparently an improper approximation. 

Although a more detailed study of athlete type and other health indica- 
tors might explain the prevalent skewness, the Lambert W results at least 
support common sense: the human body has a natural physiological lower 
bound 12 for the BMI, whereas outliers on the right tail — albeit, in principle, 
also having an upper bound — are more likely. 



11 Although y is clearly not Af(0, 1), the location-scale invariance of Lambert W x 
Gaussian RVs makes this difference to scenario 1 in the simulations [Y = X ~ jV(0, 1)] 
irrelevant; finite sample properties of 7 do not change between X ~ 7V(0, 1) and general 
X ~ Af(fi x ,a^.), since in both cases fj, x and a x are also estimated. 

12 The lower truncation of the BMI at 11.967 corresponds to a 180 cm tall athlete only 
weighing 38.88 kg. 
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Table 8 

Lambert W x t MLE for the LATAM series 





Estimate 


Std. error 


t value 


Pr(>|t|) 


c 


0.197 


0.037 


5.270 


0.000 


s 


1.240 


0.057 


21.854 


0.000 


V 


7.047 


2.196 


3.208 


0.001 


7 


-0.053 


0.014 


-3.860 


0.000 



7.2. Asset returns. A lot of financial data, also the LATAM return series 
introduced in Section 1 (Table 1 and Figure 2), display negative skewness 
and excess kurtosis. These so-called stylized facts are well known and typ- 
ically addressed via (generalized) auto-regressive conditional heteroskedas- 
tic (GARCH) [Engle (1982), Bollerslev (1986)] or stochastic volatility (SV) 
models [Melino and Turnbull (1990), Deo, Hurvich and Lu (2006)]. A theo- 
retical analysis of Lambert W x F time series models, however, is far beyond 
the scope and focus of this work. For empirical evidence regarding the use- 
fulness and significance of Lambert W x F distributions in GARCH models 
and possible future research directions see Section 7.2.1. It is also worth 
noting that the Lambert W x F transformation (2.2) resembles SV models 
very closely, and connections between the two can be made in future work. 

Based on the news o- return interpretation in a stock market S, it makes 
sense to assume a symmetric input distribution Fx{x) for the latent news 
RV X. Without specifying the symmetric Fx(x) any further, the IGMM 
algorithm gives a robust estimate for r: here tiqmm = (—0.048,0.190, 1.456). 
Column 2 of Table 1 shows that the unskewed data x^ GMM — here interpreted 
as news hitting the market — is non-Gaussian, but a t-distribution cannot be 
rejected. In consequence, Y is modeled as a Lambert W x location-scale t- 
distribution with (3 = (c, s, v), where c is the location, s the scale and v the 
degrees of freedom parameter. Table 8 shows that all coefficients of 6>mle 
are highly significant; in particular, 7 increased substantially (in absolute 
value), as 7 now solely addresses asymmetry in the data, and v can capture 
excess kurtosis. Thus, the prevalent negative skewness in the LATAM daily 
returns is not an artifact of large outliers in the left tail of an otherwise 
symmetric distribution, but a significant characteristic of the data. 

In order to check if the Lambert W x i-distribution is indeed an ap- 
propriate model Jbr y, it is useful to study the back-transformed data Xf ; 
here tmle '■= t(#mle) = (0.197,1.465,-0.053). Not surprisingly, the skew- 
ness of x^ 1LE reduced to almost (column 3 of Table 1). As a Kolmogorov- 
Smirnov test cannot reject a student t-distribution for x^ LE , the Lam- 
bert W x t-distribution 



Y = (Ue-° mdU ) 1.465 + 0.197, 
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n 





F 



-6 



-4 



-2 







2 



4 



6 



8 



Log-news (in %) 



Fig. 7. News x? MLE -f-> return y scatter plot plus histograms; solid 45° line: 7 = 0. Dashed 
vertical and horizontal lines represent the sample mean of x? MLE and y, respectively. 



is an adequate unconditional probabilistic model for the LATAM returns y. 

The effect of news xt in the market S is clearly shown in a scatter plot 
of Xf MLE versus y. For example, consider the lower-left point (X1346, Z/1346) ~ 
(—4.8,-6.1) in Figure 7. Here, the observed negative return equals —6.1%, 
but as 7 = —0.053 < 0, this outcome was an overreaction to bad news that 
was only "worth" —4.8%. For location-scale Lambert W RVs the skew- 
ness parameter 7 is a powerful, yet easy way to characterize different mar- 
kets/assets. The negative 7 shows that this specific market (system) is ex- 
aggerating bad news, and devalues positive news. 

Value at risk (VaR ). The VaR is a popular measure in financial statistics 
to estimate the potential loss for an investment in an asset over a fixed 
time period. That is, the maximum percentage an investor can expect to 
lose — with a confidence of 1 — a — over a fixed time period. Statistically this 
corresponds to the a-quantile of the distribution. The VaR can be obtained 
in various ways: the simplest are empirical and theoretical quantiles given the 
estimated parameter vector of a parametric distribution (which are sufficient 
for comparative purposes). 



U = 



X- 0.197 
1.465 
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Table 9 

VaR comparison for the LATAM series 



a 


Method 0.005 


0.01 


0.05 


0.5 


0.95 


0.99 


0.995 


empirical —4.562 


-4.078 


-2.478 


0.138 


2.344 


3.192 


3.818 


Gaussian —3.660 


-3.294 


-2.293 


0.121 


2.535 


3.535 


3.901 


t -4.297 


-3.634 


-2.214 


0.121 


2.455 


3.875 


4.538 


Lambert W x t -4.871 


-4.049 


-2.358 


0.197 


2.351 


3.437 


3.893 


Skew-t -4.715 


-3.973 


-2.364 


0.201 


2.346 


3.465 


3.957 



As expected, a Gaussian distribution underestimates both the low and 
high quantiles, as it lacks the capability to capture excess kurtosis (see Ta- 
ble 9). The t-distribution with Pmle = 6.22 degrees of freedom has heavier 
tails, but underestimates low and overestimates high quantiles: clearly an 
indication of the prevalent skewness in the data. The Lambert W x t and the 
skew ^-distribution 13 are the best approximation to the empirical quantiles: 
both heavy tails and negative skewness are captured (see also the Lambert 
W x t QQ plot in Figure 2). There is no clear "winner" between the two 
skewed distributions: skew-t quantiles are closer to the empirical ones for 
small a, Lambert W xt quantiles are closer for large a. Around the median 
(a = 0.5) both skewed distributions are far away from the true value: the 
reason being a high concentration of close to returns in financial assets, 
so-called "inliers" [see Breidt and Carriquiry (1995)]. 

7.2.1. Nonindependence of financial data. It is well known that financial 
return series yt typically exhibit positive auto-correlation in their squares yj , 
which violates the independence assumption of the MLE presented in Sec- 
tion 5.1. A standard parametric way to capture this dependence is a GARCH 
model [Bollerslev (1986), Engle (1982)], which models the variance at time t, 
of, as a function of its own past. A simple, yet very successful model for an 
uncorrelated yt is a GARCH(1, 1), 

y t = n + e t a t , 

of = w + ay\_ x + /3<rf_i , 

where Et is a zero-mean, unit-variance i.i.d. sequence [for technical details 
see Nelson (1990), Engle (1982)]. Typically, s t ~ JV(0, 1), but also student t- 
or skew t-distributions are used for more flexibility in the conditional distri- 
bution of Et given the information set £lt-i available at time t — 1 [Bauwens 
and Laurent (2005)]. French, Schwert and Stambaugh (1987) also found that 



13 MLE estimates are (0.917,1.422,-0.799,7.156) for the location, scale, shape and 
degrees of freedom parameter respectively; function st.mle in the sn package. 
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the standardized residuals (yt — V)/ot — which can be considered as an i.i.d. 
sequence — still exhibit negative skewness after fitting a Gaussian GARCH 
model to S&cP 500 returns. 

After fitting a student-t GARCH(1,1) model 14 to the LATAM return 
series y, the Lambert W x t MLE fit for the standardized residuals — which 
are approximately i.i.d. and thus do not violate the MLE assumptions — 
still gives a highly significant 7 = —0.048 with a p-value of 0.000113 (other 
estimates are not shown here). 

While I will not study Lambert W x student-t GARCH models in detail, 
this example and the great flexibility of Lambert W x F distribution com- 
bined with the possibility to symmetrize skewed data suggest that Lambert 
W x F GARCH (and SV) models are a promising area of future research. 

This analysis confirms previous findings that negative skewness is an im- 
portant feature of asset returns. For example, optimal portfolio models based 
on skewed distributions lead to better suited decision rules to react to asym- 
metric price movements. It also shows that Lambert W distributions model 
the characteristics of financial returns as well as skew t-distributions, with 
the additional option to recover symmetric latent data, which is not pos- 
sible for RVs based on a manipulation of the p.d.f. rather than a variable 
transformation. 

8. Relation to Tukey's h distribution. During the final review process, 
Professor Andrew F. Siegel suggested possible connections of Lambert W 
distributions to Tukey's g-h distribution [Tukey (1977)] 



where {7~A/"(0,1). Here g is the skew parameter and h controls the tail 
behavior of Z. 

Although the underlying idea to introduce skewness is the same, the spe- 
cific transformations to get the skewness effects are different, and so are the 
properties of the transformed RVs. 



becomes symmetric. The RV Z has Tukey's h distribution and is commonly 
used to model heavy-tails [Fischer (2006), Field (2004)]. Equation (8.2) re- 
veals a close link of Lambert W x F RVs to the existing statistics literature 
by noting that if Z ~ h, then Z 2 = U 2 e hU has a noncentral, nonscaled Lam- 
bert W x xf distribution with 7 = h. 



(8.1) 




h>0 



For g -> 0, 



(8.2) 




Function garchFit (■) in the fGarch package. 
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For further important connections between the Lambert W function and 
Tukey's h distribution see Goerg (2011). 

9. Discussion and outlook. Whereas the Lambert W function plays an 
important role in mathematics, physics, chemistry, biology and other fields, 
it has not yet been used in statistics. Here I introduce it in an input /output 
setting to skew and "unskew" RVs and data, respectively. 

Successful application to biomedical and financial data together with the 
great flexibility with respect to the type of input RV X of Lambert W x F 
RVs promise a wide range of applications as well as theoretical studies for 
particularly chosen input distributions. 

Last but not least, a very pragmatic advantage of the transformation- 
based Lambert W x F RVs compared to other approaches to asymmetry: 
data can be "unskewed" using Lambert's W function. 
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